Index Pruning and Result Reranking: Effects on Ad-Hoc Retrieval and Named Page Finding
نویسندگان
چکیده
We describe experiments conducted for the TREC 2006 Terabyte track. Our experiments are centered around two concepts: Static index pruning (for increased retrieval efficiency) and result reranking (for improved precision). We investigate their effect on retrieval efficiency and effectiveness, paying special attention to the difference between ad-hoc retrieval and named page finding. We show that index pruning and reranking based on relevance models can be beneficial in an ad-hoc retrieval setting, but have a disastrous repercussion on the effectiveness of named page finding. Result reranking based on anchor text, on the other hand, is very useful for named page finding, but should not be used for ad-hoc retrieval. This dichotomy poses a problem for search engines, as there is no easy way for a search engine to decide whether a given query represents an ad-hoc retrieval task, with the purpose to satisfy an abstract information need, or a named page finding task, targeting a specific document.
منابع مشابه
THUIR at TREC 2005 Terabyte Track
IR group of Tsinghua University this year has used its TMiner text retrieval system for indexing and retrieval of the Terabyte track ad hoc and named-page subtasks. In doing the two tasks, we used the in-link anchor texts (the anchor of the URLs that point to the current page in the collection) together with the content texts of the web pages for building the indices. When retrieving, the word-...
متن کاملNational Taiwan University at Terabyte Track of TREC 2005
There are three tasks in the Terabyte track of TREC 2005, i.e. Efficiency, Ad hoc and Named page finding. We participated in all the tasks and use different retrieval methods to deal with each task, aiming to vary the retrieval method according to the different characteristics of different tasks. In Ah hoc task, we adopt the technique of query-specific clustering. In Named page finding task, we...
متن کاملFinding Task 4 . 1 Motivation and Description of Our Method
There are three tasks in the Terabyte track of TREC 2005, i.e. Efficiency, Ad hoc and Named page finding. We participated in all the tasks and used different retrieval methods to deal with each task, aiming to vary the retrieval method according to the different characteristics of different tasks.
متن کاملA Comparative Study of Probabalistic and Language Models for Information Retrieval
Language models for information retrieval have received much attention in recent years, with many claims being made about their performance. However, previous studies evaluating the language modelling approach for information retrieval used different query sets and heterogeneous collections, which make reported results difficult to compare. This research is a broad-based study that evaluates la...
متن کاملA Comparative Study of Probabilistic and Language Models for Information Retrieval
Language models for information retrieval have received much attention in recent years, with many claims being made about their performance. However, previous studies evaluating the language modelling approach for information retrieval used different query sets and heterogeneous collections, which make reported results difficult to compare. This research is a broad-based study that evaluates la...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006